Skip to content

Fix: expand dict transformation#9561

Open
Shamik-07 wants to merge 20 commits into
marimo-team:mainfrom
Shamik-07:fix/expand_dict_transformation
Open

Fix: expand dict transformation#9561
Shamik-07 wants to merge 20 commits into
marimo-team:mainfrom
Shamik-07:fix/expand_dict_transformation

Conversation

@Shamik-07
Copy link
Copy Markdown
Contributor

@Shamik-07 Shamik-07 commented May 15, 2026

📝 Summary

Using narwahls to convert all backend to polars and then using the unnest function of polars for expanding the dict and then convert it back to the original backend.
Closes #4583

Screen.Recording.2026-05-15.at.18.07.461.mov

📋 Pre-Review Checklist

  • For large changes, or changes that affect the public API: this change was discussed or approved through an issue, on Discord, or the community discussions (Please provide a link if applicable).
  • Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it.
  • Video or media evidence is provided for any visual changes (optional).

✅ Merge Checklist

  • I have read the contributor guidelines.
  • Documentation has been updated where applicable, including docstrings for API changes.
  • Tests have been added for the changes made.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 15, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment May 22, 2026 9:32pm

Request Review

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files

Architecture diagram
sequenceDiagram
    participant User as User (Marimo UI)
    participant DFPlugin as Dataframe Plugin
    participant Handler as ExpandDict Handler (handlers.py)
    participant Narwhals as Narwhals Layer
    participant Polars as Polars Engine
    participant Backend as Original Backend (pandas/Ibis/other)

    Note over User,Backend: Expand Dict Transformation Flow

    User->>DFPlugin: Trigger expand dict on column
    DFPlugin->>Handler: handle_expand_dict(df, transform)

    Handler->>Narwhals: collect_and_preserve_type(df)
    Narwhals->>Backend: Collect actual data from original backend
    Backend-->>Narwhals: Data as native type
    Narwhals-->>Handler: Collected DataFrame + undo function

    Handler->>Polars: collected_df.to_polars()
    Note over Handler,Polars: Convert to Polars for unnest support

    Polars->>Polars: polars_df.unnest(column_id)
    Note over Polars: Handles null dict values correctly

    Polars-->>Handler: Unnested Polars DataFrame

    Handler->>Narwhals: nw.from_native(unnested)
    Narwhals->>Handler: Narwhals wrapper

    Handler->>Handler: undo(narwhals_df)
    Note over Handler: Convert back to original backend type

    Handler-->>DFPlugin: Transformed DataFrame
    DFPlugin-->>User: Updated table with expanded columns
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Re-trigger cubic

Comment thread marimo/_plugins/ui/_impl/dataframes/transforms/handlers.py
{"A": [{"foo": 1, "bar": "hello"}], "B": [1]}
{"A": [{"foo": 1, "bar": "hello"}], "B": [1]},
),
create_test_dataframes(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

polars should already be created in this create_test_dataframes. so we dont need to add the dataframe below

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Copy Markdown
Contributor

@mscolnick mscolnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to take an optional dep on polars

Shamik-07 added 3 commits May 19, 2026 17:45
… rows with the create test dataframes instead.
refactor: adding None == NaN in assert frame equal with nans method to use it in expand_dict test.
@Shamik-07
Copy link
Copy Markdown
Contributor Author

need to take an optional dep on polars

Done.
Added allow_none_equals_nan for assert_frame_equal_with_nans as None!=NaN, which was causing the use of assert_frame_equal_with_nans to fail for test_expand_dict.

@Shamik-07 Shamik-07 requested a review from mscolnick May 19, 2026 22:30
@Shamik-07
Copy link
Copy Markdown
Contributor Author

There are some pandas CI errors that i am looking into.

@Shamik-07
Copy link
Copy Markdown
Contributor Author

There are some pandas CI errors that i am looking into.

This is happening because of data conversion mismatch between pandas and narwahls with mixed data types

"Could not convert '3' with type str: tried to convert to double"

for

test_print_code_result_matches_actual_transform_pandas(
    transform=ExpandDictTransform(
        type=TransformType.EXPAND_DICT,
        column_id='strings',
    ),
)

So my only option is to fallback to pandas backend processing separately for the unnest.
This should fix it.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

Comment thread marimo/_plugins/ui/_impl/dataframes/transforms/handlers.py
@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Bundle Report

Bundle size has no change ✅

Affected Assets, Files, and Routes:

view changes for bundle: marimo-esm

Assets Changed:

Asset Name Size Change Total Size Change (%)
assets/dist-*.js -152 bytes 104 bytes -59.38%
assets/dist-*.js -139 bytes 137 bytes -50.36%
assets/dist-*.js -218 bytes 169 bytes -56.33%
assets/dist-*.js 14 bytes 183 bytes 8.28% ⚠️
assets/dist-*.js 73 bytes 256 bytes 39.89% ⚠️
assets/dist-*.js -83 bytes 176 bytes -32.05%
assets/dist-*.js 82 bytes 259 bytes 46.33% ⚠️
assets/dist-*.js 283 bytes 387 bytes 272.12% ⚠️
assets/dist-*.js -33 bytes 104 bytes -24.09%
assets/dist-*.js 81 bytes 183 bytes 79.41% ⚠️
assets/dist-*.js -19 bytes 164 bytes -10.38%
assets/dist-*.js 239 bytes 403 bytes 145.73% ⚠️
assets/dist-*.js 8 bytes 177 bytes 4.73%
assets/dist-*.js 65 bytes 169 bytes 62.5% ⚠️
assets/dist-*.js 159 bytes 335 bytes 90.34% ⚠️
assets/dist-*.js -299 bytes 104 bytes -74.19%
assets/dist-*.js 172 bytes 276 bytes 165.38% ⚠️
assets/dist-*.js -233 bytes 102 bytes -69.55%
assets/__vite-*.js -5 bytes 93 bytes -5.1%
assets/__vite-*.js 5 bytes 98 bytes 5.38% ⚠️

@Shamik-07
Copy link
Copy Markdown
Contributor Author

@mscolnick
Apart from the cubic ai's review comment, we started this because expand_dict was having trouble with NaN rows.
We have certainly fixed that, but the issue is regarding deeply nested dicts being expanded by pandas if we are using pd.json_normalize vs. polars unnest will only unnest a single level by default. This will yield discrepancies between both the backend outputs.

Current Behaviour
e.g. df with nested columns
image

polars unnest
image

pandas unnest
image

Proposal
Let's unnest only a single level in pandas thus matching polars unnest.
This will always give us the same backend agnostic output
image

I have added the fix to this PR, please let me know your thoughts.

@Shamik-07
Copy link
Copy Markdown
Contributor Author

Screen.Recording.2026-05-21.at.18.07.581.mov

… of pandas, which raises an error with pd.json_normalise for expand_dict function.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New dataframe transform: Polars Native Expand Dict Transformation

2 participants